WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification °* : 

C07H 21/04, C12P 21/02, C12N 15/11, 
15/33, 15/48, 15/85 



Al 



(11) International Publication Number: WO 98A2207 

(43) International Publication Date: 26 March 1998 (26.03.98) 



(21) International Application Number: PCT/US97/ 16639 

(22) International Filing Date: 18 September 1997 (18.09.97) 



(30) Priority Data: 

08/717,294 



20 September 1996 (20.09.96) US 



(71) Applicant: THE GENERAL HOSPITAL CORPORATION 

[US/USJ. 55 Fruit Street, Boston, MA 021 14 (US). 

(72) Inventors: SEED, Brian; Apartment 5J, Nine Hawthorne Place, 

Boston, MA 021 14 (US). HAAS. Jurgen; lluberweg 13. D- 
69198 Schriesheim (DE). 

(74) Agent: ELBING, Karen. L; Clark & Elbing LLP. 176 Federal 
Street, Boston, MA 021 10 (US). 



(81) Designated States: AL. AM. AT. AU, A2, BA. BB, BG, BR. 
BY. CA. CH. CN. CU, CZ, DE. DK, EE, ES. FI, GB, GE. 
GH. HU. ID. IL. IS. JP. KE. KG. KP. KR. KZ. LC. LK. 
LR, LS, LT. LU. LV. MD, MG. MK, MN, MW. MX, NO. 
NZ, PL FT. RO. RU, SD, SE, SG t SI, SK, SL, TJ. TM, 
TR, TT, UA, UG. UZ, VN, YU, ZW. ARIPO patent (GH, 
KE. LS, MW, SD, SZ. UG. ZW). Eurasian patent (AM, AZ. 
BY, KG. KZ. MD, RU. TJ. TM). European patent (AT. BE, 
CH, DE, DK, ES, FI, FR, GB. GR. IE, IT. LU. MC, NL. 
PT, SE). OAPI patent (BF, BJ, CF, CG. CI. CM, GA. GN. 
ML. MR. NE. SN, TD. TC). 



Published 

With international search report. 



(54) Title: HIGH LEVEL EXPRESSION OF PROTEINS 
(57) Abstract 



The invention features a synthetic gene encoding a protein normally expressed in a mammalian cell wherein at least one non-preferred 
or less preferred codon in the natural gene encoding the protein has been replaced by a preferred codon encoding the same amino acid. 
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TABLE 4: Codon Frequency Table of the Native Factor 
VIII B Domain Deleted Gene 

15 AA Codon Number /1 000 Fraction 
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Use 

The synthetic genes of the invention are useful for expressing the a 
protein normally expressed in mammalian cells in cell culture (e.g. for 
commercial production of human proteins such as hGH, TP A, Factor VIII, and 

25 Factor IX). The synthetic genes of the invention are also useful for gene 
therapy. For example, a synthetic gene encoding a selected protein can be 
introduced in to a cell which can express the protein to create a cell which can 
be administered to a patient in need of the protein. Such cell-based gene 
therapy techniques are well known to those skilled in the art, see, e.g., 

30 Anderson, et ah, U.S. Patent No. 5,399,349; Mulligan and Wilson, U.S. Patent 
No. 5,460,959. 

What is claimed is: 
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1 . A synthetic gene encoding a protein normally expressed in an 
eukaryotic cell wherein at least one non-preferred or less preferred codon in a 
natural gene encoding said protein has been replaced by a preferred codon 
encoding the same amino acid, said synthetic gene being capable of expressing 
said protein at a level which is at least 1 10% of that expressed by said natural 
gene in an in vitro mammalian cell culture system under identical conditions. 

2. The synthetic gene of claim 1 wherein said synthetic gene is 
capable of expressing said protein at a level which is at least 1 50% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

3. The synthetic gene of claim 1 wherein said synthetic gene is 
capable of expressing said protein at a level which is at least 200% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

4. The synthetic gene of claim 1 wherein said synthetic gene is 
capable of expressing said protein at a level which is at least 500% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

5. The synthetic gene of claim 1 wherein said synthetic gene 
comprises fewer than 5 occurrences of the sequence CG. 

6. The synthetic gene of claim 1 wherein at least 10% of the codons 
in said natural gene are non-preferred codons. 
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7. The synthetic gene of claim 1 wherein at least 50% of the codons 
in said natural gene are non-preferred codons. 

8. The synthetic gene of claim 1 wherein at least 50% of the non- 
preferred codons and less preferred codons present in said natural gene have 
been replaced by preferred codons. 

9. The synthetic gene of claim 1 wherein at least 90% of the non- 
preferTed codons and less preferred codons present in said natural gene have 
been replaced by preferred codons. 

10. The synthetic gene of claim 1 wherein said protein is normally 
expressed by a mammalian cell. 

1 1. The synthetic gene of claim 1 wherein said protein is a retroviral 

protein. 

12. The synthetic gene of claim 1 wherein said protein is a lentiviral 

protein. 

13. The synthetic gene of claim 1 1 wherein said protein is an HIV 

protein. 

14. The synthetic gene of claim 13 wherein said protein is selected 
from the group consisting of gag, pol, and env. 

15. The synthetic gene of claim 13 wherein said protein is gpl20. 
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16. The synthetic gene of claim 13 wherein said protein is gpl60. 

1 7. The synthetic gene of claim 1 wherein said protein is a human 

protein. 

1 8. The synthetic gene of claim 1 wherein said human protein is 
Factor VIIL 

19. The synthetic gene of claim 1 wherein 20% of the codons are 
preferred codons. 

20. The synthetic gene of claim 18 wherein said gene has the 
coding sequence present in SEQ ID NO:42. 

21 . The synthetic gene of claim 1 wherein said protein is green 
fluorescent protein. 

22. The synthetic gene of claim 20 wherein said synthetic gene is 
capable of expressing said green fluorescent protein at a level which is at least 
200% of that expressed by said natural gene in an in vitro mammalian cell 
culture system under identical conditions. 

23. The synthetic gene of claim 20 wherein said synthetic gene is 
capable of expressing said green fluorescent protein at a level which is at least 
1000% of that expressed by said n atural gene in an in vitro mammalian cell 
culture system under identical conditions. 
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24. The synthetic gene of claim 21 having the sequence depicted 
in Figure 1 1 (SEQ ID NO:40). 

25. An expression vector comprising the synthetic gene of 

claim 1. 

26. The expression vector of claim 2 1 , said expression vector 
being a mammalian expression vector. 

27. A mammalian cell harboring with the synthetic gene of 

claim 1. 

28. A method for preparing a synthetic gene encoding a protein 
normally expressed by mammalian cells, comprising identifying non-preferred 
and less-preferred codons in the natural gene encoding said protein and 
replacing one or more of said non-preferred and less-preferred codons with a 
preferred codon encoding the same amino acid as the replaced codon. 
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Syngpl20rnr\ 1 /18 

1 CTCCACATCC ATTCTCCTCT AAAGGAGATA CCCCCCCAGA CACCCTCACC 
51 TGCGGTGCCC AGCTCCCCAC GCTGAGCCAA GAGAAGCCCA GAAACCATCC 
101 CCATCGGGTC T'lTGCAACCC CTCCCCACCT 7GTACCTCCT GGGGATGCTG 
151 GTCGCTTCCG TCCTAGCCAC CGAGAAGCTG TCGGTGACCG TCTACTACGG 
201 CGTGCCCGTG TGGAAGGAGG CCACCACCAC CCTGT TCTGC GCCACCGACG 

251 CCAACGCGTA CSACACCGAG GTGCACAACG TGTGGGCCAC CCACCCCTGC 

331 GT3CCCACCC ACCCCAACCC CCAGGAGGTG GACCTCGTGA ACGTGACCGA 

331 CAACTTCAAC ATG7CGAAGA ACAACATGG7 CGAGCACATG CATGAGGACA 

401 TCATCAGCCT GTGGGACCAG AGCCTGAAGC CC7CCGTGAA GCTCACCCCC 

451 CTGTGCGTGA C CCTGAACTG CACCGACCTG AGGAACACCA CCAACACCAA 

5C1 CAACACCACC QCCAACAACA ACAGCAACAG CGAGGGCACC ATCAAGGCCG 

551 GCCACA7GAA CAACTGCAGC TTCAACATCA CCACCAGCAT CCCCGACAAC 

601 ATCCAGAAGG- A37ACGCCCT GCTCTACAAG CTGGATATCC TGAGCATCGA 

651 CAACGACAGC ACCAGCTACC GCCTGATCTC CTGCAACACC AGCCTGATCA 

701 CCCAGGCCTG QCCCAAGATC AGCTTCGAGC CCATCCCCAT CCACTACTGC 

751 GCCCCCGCCG G^CTTCGCCAT CCTGAAGTGC AACCACAAGA ACTTCAGCGC 

801 CAAGGGCAGC TGCAAGAACC TCAGCACCCT GCAGTCCACC CACGGCATCC 

651 GGCCGGTGGT C^AGCACCCAC CTCCTGCTCA ACGGCAGCCT GGCCGAGGAG 

9C1 GACGTGGTCA TCCGCAGCGA GAACTTCACC GACAACGCCA AGACCATCAT 

951 CG7GCACCTG AATGAGAGCG TGCAGATCAA C7CCACGCGT CCCAACTACA 

1001 ACAAGCGCAA ^CGCATCCAC A7CCGCCCCC GGCCCGCCTT CTACACCACC 

1C51 AAGAACA7CA TCGGCACCAT CCGCCACGCC CACTGCAACA TCTCTAGAGC 

1101 CAAGTGGAAC CACACCCTGC GCCAGA7CGT GAGCAAGCTC AAGGAGCAGT 

1151 TCAAGAACAA GACCATCGTG TTCAACCAGA GCAGCGGCGG CGACCCCGAG 

1201 A7CG7GATGC ACAGCTTCAA C7CCGGCGCC GAATTCTTCT ACTCCAACAC 

1251 CAGCCCCCTC TTCAACAGCA CCTCCAACGC CAACAACACC TGCAACAACA 

1301 CCACCGGCAC CAACAACAAT ATTACCCTCC AG7GCAAGAT CAAGCAGATC 

13 5 L A7CAACA7CT CGCACGACG7 CGCCAACGCC ATCTACCCCC CCCCCATCGA 
1401 CGGCCAGATC CGGTGCACCA CCAACATCAC CGGTCTCCTG CTCACCCGCC 

14 51 ACCCCGGCAA GGACACCGAC ACCAACGACA CCCAAA7C77 CCGCCCCGGC 

(5H££T lop q) 



WO 98/12207 PCT/US97/16639 

2/18 

1501 GGCGCCGACA TGCGCCACAA C7CCAGATCT CAGCTGTACA ACTACAAGGT 
1551 CCTGACCATC ^CCCCCTCG GCGTCGCCCC CACCAACGCC AAGCCCCCCG 
1601 TGGTGCAGCG CGAGAAGCGC TAAAGCGGCC GC (SEQ ID NO;34) 
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14 SI GCGCCCCGAT CGGCGCCC7C T7CC7CCCC7 7CC7GCGGGC GGCCCCCAGC 
1501 ACCATGGGCC CCCCCAGCG7 GACCCTGACC GTGCAGCCCC CCCTCCTCCT 
1551 GAGCCGCATC CjTCCAGCACC AGAACAACCT CCTCCGCGCC ATCGAGGCCC 
1601 AGCAGCATAT Q'ZTCCACCTC ACCCTGTGCC CCA7CAAGCA GCTCCACGCC 
1651 CGCC7CC7GG CCGTGGAGCG CTACCTGAAG GACCAGCAGC TCCTCGGCTT 
1701 C7CCGCCTCC TCCGGCAAGC 7GA7C7CCAC CACCACGCTA QCC7CGAACG 
L7S1 CCTCCTGGAG CAACAACACC C7GGACGACA TC7CCAACAA CATGACC7CG 
1SC1 A7GCAG7GGG AGCCCGAGAT CCATAACTAC ACCAGCC7GA 7CTACACCC7 
1951 GC7CGAGAAG A-CCAGACCC AGCAGGAGAA GAACGACCAG GACCTCCTGG 
1901 ACCTGGACAA CrGGGCGAGC C7C7GGAAC7 CGTTCGACAT CACCAAC7CG 
1951 C7G7GG7ACA T7AAA A7C77 CA7CATCA77 CTGGGCGGCC TGGTGGGCCT 
2001 CCGCA7CGTG T7CCCCG7Ce TGACCATCG7 GAACCCCG7G CGCCAGGGCT 
2051 ACACCCCCCT C^AGCCTCCAC ACCCGGCCCC CCG7CCCCCC CCCCCCCCAC 
21C1 CGCCCCGACG C^ATCGAGGA GGAGGGCCCC GAGCGCGACC GCGACACCAG 
2151 CGGCAGGCTC QTGCACGGCT 7CC7CCCGA7 CATCTCCC7C GACCTCCGCA 
2201 GCC7GTTCCT ^7TCAGC7AC CACCACCGCG ACC7GCTGC7 GA7CGCCGCC 
2251 CCCA7CG7CG AACTCCTAGG CCCCCGCGGC 7GGGAGG7GC TGAAGTACTG ' 
2301 C-TCGAACC7C CTCCAGTA77 GGAGCCAGGA GC7GAACTCC AGCGCCGTGA 
23 5 i GCCTGCTCAA CCCCACCCCC A7CGCCG7CG CCGAGGGCAC CGACCGCGTG 
2C01 A7CGAGG7GC TCCAGACGGC CCGGAGGGCG A7CC7GCACA 7CCCCACCCC 
2451 CATCCGCCAG liGGCTCCACA GGGCCC7GCT C (SEQ ID NO;35) 
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FIGURE 3 
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1 GAATTCACGC GTAAGCTTGC CGCCACCATG GTGAGCAAGG GCGAGGAGCT 
51 GTTCACCGGG GTGGTGCCCA TCCTGGTCGA GCTGGACGGC GACGTGAACG 
101 GCCACAAGTT CAGCGTGTCG GGCGAGGGCG AGGGCGATGC CACCTACGGC 
151 AAGCTGACCC TGAAGTTCAT CTGCACCACC GGCAAGCTGC CCGTGCCCTG 
201 GCCCACCCTC GTGACCACCT TCAGCTACGG CGTGCAGTGC TTCAGCCGCT 
251 ACCCCGACCA CATGAAGCAG CACGACTTCT TCAAGTCCGC CATGCCCGAA 
301 GGCTACGTCC AGGAGCGCAC CATCTTCTTC AAGGACGACG GCAACTACAA 
351 GACCCGCGCC GAGGTGAAGT TCGAGGGCGA CACCCTGGTG AACCGCATCG 
401 AGCTGAAGGG CATCGACTTC AAGGACGACG GCAACATCCT GGGGCACAAG 
451 CTGGAGTACA ACTACAACAG CCACAACGTC TATATCATGG CCGACAAGCA 
501 GAAGAACGGC ATCAAGGTGA ACTTCAAGAT CCGCCACAAC ATCGAGGACG 
551 GCAGCGTGCA GCTCGCCGAC CACTACCAGC AGAACACCCC CATCGGCGAC 
601 GGCCCCGTGC TGCTGCCCGA CAACCACTAC CTGAGCACCC AGTCCGCCCT 
651 GAGCAAAGAC CCCAACGAGA AGCG CGATCA CATGGTCCTG CTGGAGTTCG 
701 TGACCGCCGC CGGGATCACT CACGGCATGG ACGAGCTGTA CAAGTAAAGC 
751 GGCCGCGGAT CC (SEQ ID NO: 40) 
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Native Factor VIII B domain deleted gene segment inserted in the 
expression vector 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
6S1 
701 
751 
801 
B51 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



AAGCTTAAAC 
CTGCTGCGGA 
CCTCCCTCCA 
AGCTGCCTGT 
TTCAACACCT 
TCACCTTTTC 
CTCCTACCAT 
AACATGGCTT 
GAAAGCTTCT 
AAGAAGATGA 
GTCCTGAAAG 
CTCATATCTT 
TTGGAGCCCT 
CAGACCTTGC 
AAGTTCCCAC 
CATCTGCTCG 
AGGTCTCTGC 
TGTGATTGGA 
CTCACACATT 
CCAATAACTT 
TCTACTGTTT 
ATGTCAAAGT 
AATGAAGAAC 
TGTGCTCAGG 
CAGTTGCCAA 
GAGGAGGACT 
TTATAAAAGT 
ACAAAAAAGT 
GAAGCTATTC 
AGTTGCACAC 
ATAACATCTA 
AGATTACCAA 
AGAAATATTC 
AATCAGATCC 
GAGAGAGATC 
AGAATCTGTA 
TCATCCTGTT 
AATATACAAC 
AGAGTTCCAA 
ATAGTTTGCA 
CTAAGCATTG 
TACCTTCAAA 
TCTCACGAGA 
CTGGGGTCCC 
CAAGGTTTCT 
ATGAAGATAT 
AG AAG CTTCT 
TAATG CCACC 
CTACTCTTCA 
GTTGAAATGA 
CAGCCCCCCC 
TGGAGAGGCT 
AACACGGCTC 
GGAATTTACT 
ATCAACATTT 



CATGCCCATG 
TGCTGGTCGC 
GTGCAACTGT 
GGACGCAAGA 
CAGTCGTGTA 
AACATCGCTA 
CCACC CT GAG 
CCCATCCTGT 
CAGGGAGCTC 
TAAAGTCTTC 
AGAATGGTCC 
TCTCATGTCG 
ACTAGTATGT 
ACAAATTTAT 
TCACAAACAA 
GGCCTGGCCT 
CAGGTCTGAT 
ATCGCCACCA 
TCTTGTGACG 
TCCTTACTGC 
TGTCATATCT 
AGACAGCTGT 
CCG AAG ACTA 
TTTGATGATG 
GAAGCATCCT 
GGGACTATGC 
CAATATTTGA 
CCGATTTATG 
AGCATGAATC 
ACACTCTTCA 
CCCTCACGGA 
AAGGTGTAAA 
AAATATAAAT 
TCGGTGCCTG 
TAGCTTCAGG 
GATCAAACAG 
TTCTG TATTT 
GCTTTCTCCC 
GCCTCCAACA 
GTTGTCAGTT 
GAGCACAGAC 
CACAAAATGG 
AACTCTCTTC 
ACAACTCAGA 
AG TTCTG A CA 
TTCAGCATAC 
CCCAGAATTC 
CCACCAGTCT 
GTCAGATCAA 
AGAAGGAAGA 
AGCTTTCAAA 
CTGGGATTAT 
ACACTGGCAG 
CATGCCTCCT 
GCGACTCCTC 



CCGTCTCTGC 
TTCCGTGCTA 
CATCGGACTA 
TTTCCTCCTA 
CAAAAAGACT 
AGCCAAGGCC 
CTTTATCATA 
CAGTCTTCAT 
AATATGATGA 
CCTGGTGGAA 
AATCGCCTCT 
ACCTGGTAAA 
AGAGAAGGGA 
ACTACTTTTT 
AGAACTCCTT 
AAAATGCACA 
TGGATGCCAC 
CTCCTCAAGT 
AACCATCCCC 
TCAAACACTC 
CTTCCCACCA 
CCAGAGGAAC 
TGATGATGAT 
ACAACTCTCC 
AAAACTTGGG 
TCCCTTAGTC 
ACAATGCCCC 
GCATACACAG 
AGGAATCTTG 
TTATATTTAA 
ATCACTGATG 
ACATTTGAAG 
GCACAGTGAC 
ACCCGCTATT 
ACTCATTGGC 
GAAACCAGAT 
GATGAGAACC 
CAATCCACCT 
TCATGCACAG 
TGTTTGCATG 
TG ACTTC CTT 
TCTATGAAGA 
ATGTCGATCG 
CTTTCGGAAC 
AGAACACTGC 
TTGCTGAGTA 
AAGACACCCT 
TGAAACGCCA 
GAGGAAATTG 
TTTTGACATT 
AGAAAACACG 
CCGATCAGTA 
TCTCCCTCAG 
TTACTCAGCC 
GGGCCATATA 



AACCGCTCCC 
GCCGCCACCA 
TATGCAAAGT 
CACTGCCAAA 
CTGTTTGTAG 
ACCCTGGATG 
CACTCCTCAT 
GCTGTTGGTG 
TCACACCAGT 
CCCATACATA 
GACCCACTGT 
AGACTTCAAT 
GTCTCGCCAA 
G CTOTATTTO 
GATCCAGGAT 
CAGTCAATCG 
AGGAAATCAG 
GCACTCAATA 
AGCCGTCCTT 
TTGATGGACC 
ACATGATGGC 
CCCAACTACG 
CTTACTGATT 
TTCCTTTATC 
TACATTACAT 
CTCGCCCCCG 
TCAGCGGATT 
ATCAAACCTT 
GGACCTTTAC 
GAATCAAGCA 
TCCGTCCTTT 
GATTTTCCAA 
TGTAGAAGAT 
ACTCTAGTTT 
CCTCTCCTCA 
AATGTCACAC 
GAAGCTGGTA 
CGACTCCACC 
CATCAATCCC 
AGGTGGCATA 
TCTGTCTTCT 
CACACTCACC 
AAAACCCAGG 
ACAGGCATGA 
TGATTATTAC 
AAAACAATGC 
AGCACTAGCC 
TCAACGGGAA 
ACTATGATGA 
TATGATGAGG 
ACACTATTTT 
CCTCCCCACA 
TTCAACAAAG 
CTTATACCGT 
TAAGAGCACA 



CACCTTGTAC 
GAAGATACTA 
GAT CTCC GTG 
ATCTTTTCCA 
AATTCACGGA 
GGTCTG CTAG 
TACACTTAAG 
TATCCTACTG 
CAAAGGCAGA 
TGTCTGGCAG 
GCCTTACCTA 
TCAGGCCTCA 
GCAAAAGACA 
ATOAAGGGAA 
AGGGATGCTG 
TTATGTAAAC 
TCTATTGGCA 
TTCCTCGAAG 
GGAAATCTCG 
TTGGACAGTT 
ATGOAAGCTT 
AATGAAAAAT 
CTGAAATCGA 
CAAATTCCCT 
TGCTGCTGAA 
ATGACAGAAG 
GGTAGGAAGT 
TAAG ACT CGT 
TTTATG GGG A 
AGCAGACCAT 
GTATTCAACG 
TTCTG CCAGG 
GGGCCAACTA 
CGTTAATATG 
TCTGCTACAA 
AAGAGGAATG 
CCTCACAGAG 
TTGAGGATCC 
TATGTTTTTG 
CTGGTACATT 
TCTCTGGATA 
CTATTCCCAT 
TCTATGGATT 
CCCCCTTACT 
GACGACAGTT 
CATTGAACCA 
AAAAC C AATT 
ATAACTCGTA 
TACCATATCA 
ATGAAAATCA 
ATTGCTGCAG 
TCTTCTAAGA 
TTGTTTTCCA 
GGAGAACTAA 
ACTTGAAGAT 
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2751 AATATCATGG TAACTTTCAG 

2801 TTCTAGCCTT ATTTCTTATG 

2851 GAAAAAACTT TGTCAAGCCT 

2901 CAACATCATA TCCCACCCAC 

2951 TTATTTCTCT GATGTTGACC 

3001 GACCCCTTCT CGTCTGCCAC 

3051 CAAGTGACAG TACAGGAATT 

3101 CAAAAGCTGG TACTTCACTG 

3151 GCAATATCCA GATGGAAGAT 

3201 CCAATCAATG GCTACATAAT 

3251 GGATCAAAGG ATTCGATCGT 

3301 TCCATTCTAT TCATTTCAGT 

3351 CAGTATAAAA TCGCACTCTA 

3401 GGAAATGTTA CCATCCAAAG 

3451 GCGAGCATCT ACATGCTGGG 

3501 AAGTCTCAGA CTCCCCTGGG 

3551 CATTACAGCT TCACCACAAT 

3601 TTCATTATTC CCGATCAATC 

3651 TGGATCAAGG TGGATCTGTT 

3701 CCAGCGTGCC CGTCAGAACT 

3751 TCATGTATAG TCTTCATGGG 

3801 ACTGGAACCT TAATCCTCTT 

3851 ACACAATATT TTTAACCCTC 

3901 CAACTCATTA TACCATTCCC 

3951 GATTTAAATA CTTGCAGCAT 

4001 AGATGCACAG ATTACTGCTT 

4051 GGTCTCCTTC AAAAGCTCGA 

4101 AGACCTCAGG TGAATAATCC 

4151 GACAATGAAA GTCACAGGAG 

4201 C CAG CATGTA TGTGAAGGAG 

4251 CAGTGGACTC TCTTTTTTCA 

4301 TCAAGACTCC TTCACACCTG 

4351 CTCGCTACCT TCGAATTCAC 

4401 AGGATGGAGG TTCTGGGCTG 

4451 ACTGCAGCAC CTCCCACTGC 

4501 GTGTCCCTCC CTGGCTTGCC 

4551 CTGCCTTGAA GCCTCCTGAA 

4601 TGGGGCCCCA GGAGGGTGCA 

4651 TGCAGGCCCA ACGCGGCCGC 



AAATCAGGCC TCTCGTCCCT ATTCCTTCTA 
AGGAAGATCA GAGGCAACGA GCAGAACCTA 
AATCAAACCA AAACTTACTT TTGGAAAGTG 
TAAAGATGAG TTTGACTGCA AACCCTGGGC 
TGGAAAAAGA TGTCCACTCA GCCCTGATTG 
ACTAACACAC TGAACCCTGC TCATGCGAGA 
TGCTCTGTTT TTCACCATCT TTGATGAGAC 
AAAATATGGA AAGAAACTCC AGCGCTCCCT 
CCCACTTTTA AAGAGAATTA TCGCTTCCAT 
GGATACACTA CCTGGCTTAG TAATGGCTCA 
ATCTCCTCAG CATGGGCACC AATGAAAACA 
GGACATGTGT TCACTGTACG AAAAAAAGAG 
CAATCTCTAT CCACGTGTTT TTGAGACACT 
CTGCAATTTG GCGGGTGGAA TGCCTTATTG 
ATGAGCACAC TTTTTCTCGT GTACAGCAAT 
AATCCCTTCT GGACACATTA GAGATTTTCA 
ATCGACAGTG GGCCCCAAAG CTCGCCAGAC 
AATGCCTGGA GCACCAAGGA GCCCTTTTCT 
GGCACCAATG ATTATTCACG G CATC AAG AC 
TCTCCACCCT CTACATCTCT CAGTTTATCA 
AAGAAGTGGC AGACTTATCG AGGAAATTCC 
CTTTCGCAAT GTGGATTCAT CTGGGATAAA 
CAATTATTCC TCGATACATC CGTTTGCACC 
ACCACTCTTC CCATCGAGTT GATCGCCTCT 
GCCATTCGGA ATCGAGAGTA AAGCAATATC 
CATCCTACTT TACCAATATG TTTGCCACCT 
CTTCACCTCC AAGGGACGAG TAATCCCTCG 
AAAAGAGTGG CTCCAAGTGG ACTTCCAGAA 
TAACTACTCA GGGACTAAAA TCTCTCCTTA 
TTCCTCATCT CCAGCAGTCA AGATGG CCAT 
CAATGCCAAA CTAAACCTTT TTCACGCAAA 
TGGTGAACTC TCTACACCCA CCCTTACTGA 
CCCCAGAGTT GGGTGCACCA GATTGCCCTG 
CGAGGCACAG GACCTCTACT GAGGGTGGCC 
CCTCACCTCT CCCTCCTCAG CTCCAGGCCA 
TTCTACCTTT GTGCTAAATC CTAGCAGACA 
TTAACTATCA TCAGTCCTGC ATTTCTTTGG 
TCCAATTTAA CTTAACTCTT ACCGTCGACC 
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1 
SI 
101 
151 
201 
251 
301 
351 
401 
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801 
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901 
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1001 
1051 
1101 
1151 
1201 
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1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
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2251 
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2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



AACCTTAAAC 
CTCCTCCCGA 
CCTGGCCGCC 
AGCTCCCCGT 
TTCAACACCA 
CCACCTCTTC 
GCCCCACCAT 
AACATCCCCA 
GAAGGCCAGC 
AGGAGGACGA 
GTGCTTAACG 
CACCTACCTG 
TCGGCGCCCT 
CACACCCTCC 
GAGCTGGCAC 
CCAGCGCCCG 
CGCACCCTCC 
CCTCATCGGC 
GCCACACCTT 
CCCATCACCT 
CCTGCTCTTC 
ACGTGAAGCT 
AACGACCACG 
TCTCGTACGC 
GCGTGCCCAA 
GAGGAGGACT 
CTACAAGAGC 
ACAAGAAGGT 
GAGGCCATCC 
GCTGGGCGAC 
ACAACATCTA 
CGCCTCCCCA 
CGAGATCTTC 
AGACCGACCC 
GAGCGCGACC 
CGAGAGCGTG 
TGATCCTGTT 
AACATCCAGC 
CGAGTTCCAC 
ACAGCCTGCA 
CTGACCAXCG 
TACCTTCAAG 
TCTCCGGCGA 
CTCGCCTGCC 
GAAAGTCTCC 
ACCACGACAT 
CGCTCCTTCT 
CAACGCCACC 
CCACCCTGCA 
GTGGAGATGA 
GACCCCCCGC 
TGCAGCGCCT 
AACCCCCCCC 
GGAGTTCACC 
ACGAGCACCT 



CATGCCCATG 
TGCTGGTCGC 
GTGGAGCTGT 
GGACCCCCGC 
GCGTGGTGTA 
AACATTGCCA 
CCACGCCCAG 
GCCACCCCGT 
GAGGGCGCCG 
CAAGGTGTTC 
AGAACGGCCC 
AGCCACGTGG 
GCTGGTGTGT 
ACAAGTTCAT 
AG CG AG ACTA 
CCCCTGCCCC 
CCGGCCTGAT 
ATGGGCACCA 
CCTGGTGCGC 
TCCTGACTCC 
TGCCACATCA 
GGACAGCTGC 
CCCAGCACTA 
TTCCACGACG 
GAAGCACCCT 
CGGACTACGC 
CAGTACCTGA 
GCGCTTCATG 
AGCACGAGTC 
ACCCTGCTGA 
CCCCCACCCC 
AGGGCGTGAA 
AAGTACAAGT 
CCGCTGCCTG 
TGCCCTCCGG 
GACCAGCCCC 
CAGCGTGTTC 
GCTTCCTGCC 
GCCAGCAACA 
GCTGAGCCTC 
CCCCCCAGAC 
CACAAGATGG 
GACTCTGTTC 
ACAACAGCGA 
AGCTGCGACA 
CTCCGCCTAC 
CCCAAAACTC 
CCCCCCGTGC 
AAGCGACCAG 
AGAAGGAGGA 
TCCTTCCAAA 
GTGCGACTAC 
AGACCGGCAG 
GACCCCACCT 
GCGCCTGCTC 



GGGTCTCTGC AACCCCTGCC CACCTTGTAC 
TTCCGTGCTA GCCGCCACCC GCCGCTACTA 
CCTGGGACTA CATGCAGAGC GACCTGGGCG 
TTCCCCCCCC GCGTGCCCAA CAGCTTCCCC 
CAACAAAACC CTCTTCGTGG AGTTCACCGA 
AGCCGCGCCC CCCCTGGATG CGCCTGCTGG 
CTGTACOACA CCGTGGTGAT CACCCTCAAG 
CAGCCTGCAC CCCGTGGCCG TGAGCTACTG 
AGTACGACGA CCAGACGTCC CAGCGCGAGA 
CCGGGGCGGA GCCACACCTA CGTGTGGCAG 
TATGGCCAGC GACCCCCTGT GCCTGACCTA 
ACCTGGTGAA GGATCTGAAC AGCGGGCTGA 
CGCGACGGCA CCCTGGCCAA GGAGAAAACC 
CCTGCTGTTC GCCGTGTTCG ACGAGGGGAA 
AGAACACCCT CATGCACCAC CGCGACCCCG 
AAGATGCACA CCGTTAACGG CTACGTGAAC 
CGCCTGCCAC CGCAAGAGCG TGTACTGGCA 
CCCCTGAGGT GCACAGCATC TTCCTGGAGG 
AACCACCGCC AGGCCACCCT GCAGATCAGC 
CCAGACCCTG CTGATGGACC TAGGCCAGTT 
GCAGCCACCA CCACCACGCC ATGGAGGCTT 
CCCGAGGAGC CCCAGCTGCG CATGAAGAAC 
CGACGACGAC CTGACCCACA CCCAGATGGA 
ACAACAGCCC CACCTTCATC CACATCCCCA 
AAGACCTGGG TGCACTACAT CGCCGCCGAG 
CCCGCTAGTA CTGGCCCCCG ACCACCGCAG 
ACAACGCCCC CCAGCCCATC GGCCGCAAGT 
GCCTACACCG ACGAGACTTT CAAGACCCCC 
CGGCATCCTC GGCCCCCTGC TGTACGGCGA 
TCATCTTCAA GAACCACGCC AGCAGGCCCT 
ATCACCGACG TGCCCCCCCT CTACACCCGC 
GCACCTGAAG CACTTCCCCA TCCTGCCCCG 
GGACCGTGAC CGTGGAGGAC GGCCCCACCA 
ACCCGCTACT ACACCAGCTT CGTGAACATG 
ACTGATCGGC CCCCTGCTCA TCTCCTACAA 
GCAACCAGAT CATGAGCGAC AACCCCAACC 
GACCAGAACC GCAGCTGGTA TCTGACCGAG 
CAACCCCGCT CCCCTCCACC TCGAAGATCC 
TCATGCACAC CATCAACCGC TACCTGTTCG 
TGCCTGCATG AGCTGGCCTA CTCGTACATC 
CGACTTCCTC AGCCTGTTCT TCTCCGGGTA 
TGTACGAGGA CACCCTCACC CTCTTCCCCT 
ATGTCTATGG AGAACCCCGG CCTGTGGATT 
CTTCCGCAAC CGCGGCATGA CTGCCCTCCT 
ACAACACCCC CGACTACTAC GAGGACAGCT 
CTCCTCTCCA ACAACAACCC CATCGAGCCC 
CCCCCACCCC ACCACGCGTC ACAAGCAGTT 
TCAACCCCCA CCACCGCGAG ATCACCCCCA 
GAGGAGATCG ACTACCACCA CACCATCAGC 
CTTCGACATC TACGACGAGG ACCAGAACCA 
AGAAAACCCG CCACTACTTC ATCGCCGCCG 
CGCATCACCA CCACCCCCCA CGTCCTGCGC 
CGTGCCCCAC TTCAAGAACG TCGTGTTCCA 
TCACCCAGCC CCTGTACCGC CGCGAGCTGA 
GCCCCCTACA TCCGCCCCGA CGTGGAGGAC 
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2751 

2801 

2851 

2901 

2951 

3001 

3051 

3101 

3151 

3201 

3251 

3301 

3351 

3401 

3451 

3501 

3551 

3601 

3651 

3701 

3751 

3801 

3851 

3901 

3951 

4001 

4051 

4101 

4151 

4201 

4251 

4301 

4351 

4401 

4451 



AA CAT CATC G 
CTCCTCCCTC 
GCAAGAACTT 
CAGCACCACA 
CTACTTCAGC 
GCCCCCTGCT 
CAGGTGACTG 
TAAGACCTGG 
GCAACATCCA 
GCCATCAACG 
GGACCAGCGC 
TCCACAGCAT 
GAGTACAAGA 
GGAGATGCTG 
GCGAGCACCT 
AACTGCCAGA 
CATCACCCCC 
TGCACTACAC 
TCGATCAAGG 
CCAGGGCGCC 
TCATGTACTC 
ACCCCCACCC 
GCACAACATC 
CCACCCACTA 
CACCTCAACA 
CGACGCCCAG 
GGAGCCCCAG 
CCCCCCCAGG 
AACCATGAAG 
CCAGCATGTA 
CAGTGGACCC 
CCAGGACAGC 
CCCGCTACCT 
CCCATGGAGG 
C 



TGACCTTCCG 
ATCACCTACG 

CCTGAAGCCC 
TCCCCCCCAC 
GACGTGGACC 
CGTCTGCCAC 
TGCACGAATT 
TACTTCACCC 
GATGGAAGAT 
GCTACATCAT 
ATCCGCTGGT 
CCACTTCAGC 
TGGCCCTCTA 
CCCAGCAAGG 
CCACGCCGCC 
CCCCCCTGCG 
AGCGGCCAGT 
CCGCAGCATC 
TGGACCTGCT 
CGCCAGAACT 
TCTAGACGGC 
TGATCGTGTT 
TTCAACCCCC 
CAGCATCCGC 
GCTGCAGCAT 
ATCACCGCCT 
CAAGGCCCGC 
TGAACAACCC 
GTGACTCGCG 
CGTGAAGGAG 
TGTTCTTCCA 
TTCACACCGC 
GCGCATCCAC 
TGCTGGGCTG 



CAACCAAGCC 
AGGAGGACCA 
AACCACACTA 
CAAGGACGAG 
TGGAGAAGGA 
ACCAACACCC 
TGCCCTGTTC 
AGAACATCCA 
CCCACCTTCA 
GGACACCCTG 
ACCTGCTGTC 
GGCCACGTTT 
CAACCTGTAC 
CCGCGATCTC 
ATGAGCACCC 
CATGGCGAGC 
ACGCCCACTC 
AACGCCTGGT 
CGCCCCCATG 
TCAGCAGCCT 
AAGAAGTGGC 
CTTCGGCAAC 
CCATCATCGC 
AGCACCCTGC 
GCCCCTGCGC 
CCAGCTACTT 
CTCCACCTGC 
CAAGGAGTGG 
TCACCACCCA 
TTCCTGATCA 
AAACGGCAAG 
TCGTGAACAG 
CCCCACACCT 
CGACGCCCAG 



TCCCGGCCCT 
GCGCCAGGCC 
ACACCTACTT 
TTCGACTCCA 
CCTGCACAGC 
TGAACCCCCC 
TTCACCATCT 
GCGCAACTCC 
AGCAGAACTA 
CCCGGCCTGG 
TATGGGCAGC 
TCACCGTGCG 
CCCGGCGTGT 
CCCCGTCGAG 
TGTTCCTGGT 
GCCCACATCC 
GGCTCCCAAG 
CGACCAAGGA 
ATCATCCACG 
GTACATCAGC 
AGACCTACCG 
CTCGACAGCA 
CCGCTACATC 
CCATGGAGCT 
ATGGACACCA 
CACCAACATG 
AGGGCCGCAC 
CTCCACCTGG 
GGGCGTCAAG 
CCAGCAGCCA 
GTGAACGTCT 
CCTCGACCCC 
CCCTGCACCA 
CACCTGTACT 



ACTCCTTCTA 
CCCGACCCCC 
CTGGAAGGTG 
AGCCCTGCGC 
CGCCTGATCG 
CCACGCGACG 
TCGACGACAC 
CGCGCCCCCT 
CCGCTTCCAC 
TGATGGCCCA 
AACGAGAACA 
CAAGAAGGAG 
TCGAGACTCT 
TGCCTGATCG 
GTACAGCAAC 
CCGACTTCCA 
CTCGCCCCCC 
GCCCTTCTCC 
CCATCAACAC 
CAGTTCATCA 
CGGCAACAGC 
CCCGCATCAA 
CGCCTGCACC 
GATGCGCTCC 
AGGCCATCAG 
TTCGCCACCT 
CAACGCCTCC 
ACTTCCAGAA 
ACCCTCCTCA 
GGACGGCCAC 
TCCAGGGCAA 
CCCCTGCTCA 
CATCGCCCTC 
GAAGCGGCCG 
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